Microbiome data manipulation
Processing phyloseq objects
Instructions to manipulate microbiome data sets using tools from the phyloseq package and some extensions from the microbiome package, including subsetting, aggregating and filtering.
Load example data:
library(phyloseq)
library(microbiome)
library(knitr)
# Load example data
data(atlas1006)
# Let us rename the example data (which is a phyloseq object)
pseq <- atlas1006Retrieving data elements from a phyloseq object
A phyloseq object contains OTU table (taxa abundances), sample metadata, taxonomy table (mapping between OTUs and higher-level taxonomic classifications), and phylogenetic tree (relations between the taxa). Some of these are optional.
Pick metadata as data.frame:
meta <- meta(pseq)Taxonomy table:
taxonomy <- tax_table(pseq)Abundances for taxonomic groups (‘OTU table’) as a TaxaxSamples matrix:
# Absolute abundances
otu.absolute <- abundances(pseq)
# Relative abundances
otu.relative <- abundances(pseq, "compositional")Melting phyloseq data for easier plotting:
df <- psmelt(pseq)
kable(head(df))| OTU | Sample | Abundance | age | gender | nationality | DNA_extraction_method | project | diversity | bmi_group | subject | time | sample | Phylum | Genus | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 113110 | Prevotella melaninogenica et rel. | Sample-448 | 944002 | 54 | female | CentralEurope | o | 18 | 5.98 | lean | 448 | 0 | Sample-448 | Bacteroidetes | Prevotella melaninogenica et rel. |
| 113015 | Prevotella melaninogenica et rel. | Sample-360 | 902034 | 45 | female | CentralEurope | o | 13 | 5.49 | severeobese | 360 | 0 | Sample-360 | Bacteroidetes | Prevotella melaninogenica et rel. |
| 112747 | Prevotella melaninogenica et rel. | Sample-190 | 862870 | 34 | female | CentralEurope | r | 7 | 6.06 | lean | 190 | 0 | Sample-190 | Bacteroidetes | Prevotella melaninogenica et rel. |
| 113109 | Prevotella melaninogenica et rel. | Sample-743 | 852350 | 52 | male | US | NA | 19 | 5.21 | obese | 743 | 0 | Sample-743 | Bacteroidetes | Prevotella melaninogenica et rel. |
| 112944 | Prevotella melaninogenica et rel. | Sample-366 | 851147 | 52 | female | CentralEurope | o | 15 | 5.63 | obese | 366 | 0 | Sample-366 | Bacteroidetes | Prevotella melaninogenica et rel. |
| 113639 | Prevotella melaninogenica et rel. | Sample-375 | 844482 | 45 | female | CentralEurope | o | 16 | 5.64 | severeobese | 375 | 0 | Sample-375 | Bacteroidetes | Prevotella melaninogenica et rel. |
Sample operations
Sample names and variables
head(sample_names(pseq))## [1] "Sample-1" "Sample-2" "Sample-3" "Sample-4" "Sample-5" "Sample-6"
Total OTU abundance in each sample
s <- sample_sums(pseq)Abundance of a given species in each sample
head(abundances(pseq)["Akkermansia",])## Sample-1 Sample-2 Sample-3 Sample-4 Sample-5 Sample-6
## 1319 2299 29980 3824 2133 864
Select a subset by metadata fields:
pseq.subset <- subset_samples(pseq, nationality == "AFR")Select a subset by providing sample names:
# Check sample names for African Females in this phyloseq object
s <- rownames(subset(meta(pseq), nationality == "AFR" & sex == "Female"))
# Pick the phyloseq subset with these sample names
pseq.subset2 <- prune_samples(s, pseq)Pick samples at the baseline time points only:
pseq0 <- baseline(pseq)Data transformations
The microbiome package provides a wrapper for standard sample/OTU transforms. For arbitrary transforms, use the transform_sample_counts function in the phyloseq package. Log10 transform is log(1+x) if the data contains zeroes. Also “Z”, “clr”, “hellinger”, and “shift” are available as common transformations. Relative abundances (note that the input data needs to be in absolute scale, not logarithmic!):
pseq.compositional <- microbiome::transform(pseq, "compositional")Variable operations
Sample variable names
sample_variables(pseq)## [1] "age" "gender"
## [3] "nationality" "DNA_extraction_method"
## [5] "project" "diversity"
## [7] "bmi_group" "subject"
## [9] "time" "sample"
Pick values for a given variable
head(get_variable(pseq, sample_variables(pseq)[1]))## [1] 28 24 52 22 25 42
Assign new fields to metadata
# Calculate diversity for samples
div <- global(pseq, index = "shannon")
# Assign the estimated diversity to sample metadata
sample_data(pseq)$diversity <- divTaxa operations
Number of taxa
n <- ntaxa(pseq)Most abundant taxa
topx <- top_taxa(pseq, n = 10)Names
ranks <- rank_names(pseq) # Taxonomic levels
taxa <- taxa(pseq) # Taxa names at the analysed levelSubset taxa
pseq.bac <- subset_taxa(pseq, Phylum == "Bacteroidetes")Prune (select) taxa:
# List of Genera in the Bacteroideted Phylum
taxa <- map_levels(NULL, "Phylum", "Genus", pseq)$Bacteroidetes
# With given taxon names
ex2 <- prune_taxa(taxa, pseq)
# Taxa with positive sum across samples
ex3 <- prune_taxa(taxa_sums(pseq) > 0, pseq)Filter by user-specified function values (here variance):
f <- filter_taxa(pseq, function(x) var(x) > 1e-05, TRUE)List unique phylum-level groups:
head(get_taxa_unique(pseq, "Phylum"))## [1] "Actinobacteria" "Bacilli"
## [3] "Proteobacteria" "Verrucomicrobia"
## [5] "Bacteroidetes" "Clostridium cluster XV"
Pick the taxa abundances for a given sample:
samplename <- sample_names(pseq)[[1]]
# Pick abundances for a particular taxon
tax.abundances <- abundances(pseq)[, samplename]Merging operations
Aggregate taxa to higher taxonomic levels. This is particularly useful if the phylogenetic tree is missing. When it is available, see merge_samples, merge_taxa and tax_glom).
Put the less abundant taxa together in the “Other” category:
pseq2 <- aggregate_taxa(pseq, "Phylum", top = 5) Merge the desired taxa into “Other” category. Here, we merge all Bacteroides groups into a single group named Bacteroides.
pseq2 <- merge_taxa2(pseq, pattern = "^Bacteroides", name = "Bacteroides") Merging phyloseq objects with the phyloseq package:
merge_phyloseq(pseqA, pseqB)Rarification
pseq.rarified <- rarefy_even_depth(pseq)Taxonomy
Convert between taxonomic levels (here from Genus (Akkermansia) to Phylum (Verrucomicrobia):
m <- map_levels("Akkermansia", "Genus", "Phylum", tax_table(pseq))
print(m)## [1] "Verrucomicrobia"
Metadata
Visualize frequencies of given factor (sex) levels within the indicated groups (group):
res <- plot_frequencies(sample_data(pseq), "bmi_group", "gender")
print(res$plot)# Retrieving the actual data values:
kable(head(res$data), digits = 2)| Groups | Factor | n | pct |
|---|---|---|---|
| underweight | female | 21 | 91.30 |
| underweight | male | 2 | 8.70 |
| lean | female | 304 | 61.66 |
| lean | male | 189 | 38.34 |
| overweight | female | 102 | 50.00 |
| overweight | male | 102 | 50.00 |
Custom functions are provided to cut age or BMI information into discrete classes.
group_bmi(c(22, 28, 31), "standard")## [1] lean overweight obese
## Levels: underweight lean overweight obese severe morbid super
group_age(c(17, 41, 102), "decades")## [1] [10,20) [40,50) [100,110]
## 10 Levels: [10,20) [20,30) [30,40) [40,50) [50,60) [60,70) ... [100,110]
Add metadata to a phyloseq object. For reproducibility, we just use the existing metadata in this example, but this can be replaced by another data.frame (samples x fields).
# Example data
data(dietswap)
pseq <- dietswap
# Pick the existing metadata from a phyloseq object
# (or retrieve this from another source)
df <- meta(pseq)
# Merge the metadata back in the phyloseq object
pseq2 <- merge_phyloseq(pseq, sample_data(df))